Beyond Hill Climbing

In the previous video, you learned about the hill climbing algorithm.

We denoted the expected return by J. Likewise, we used \theta to refer to the weights in the policy network. Then, since \theta encodes the policy, which influences how much reward the agent will likely receive, we know that J is a function of \theta.

Despite the fact that we have no idea what that function J = J(\theta) looks like, the hill climbing algorithm helps us determine the value of \theta that maximizes it. Watch the video below to learn about some improvements you can make to the hill climbing algorithm!

Note: We refer to the general class of approaches that find \arg\max_{\theta}J(\theta) through randomly perturbing the most recent best estimate as stochastic policy search. Likewise, we can refer to J as an objective function, which just refers to the fact that we'd like to maximize it!

06. Beyond Hill Climbing

Beyond Hill Climbing

## Video

M2L3 04 V1